Sequence analysis CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes

نویسندگان

  • Thomas Wolf
  • Vladimir Shelest
  • Neetika Nath
  • Ekaterina Shelest
چکیده

Motivation: Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters’ content and lack of other distinguishing sequence features. Results: We present Cluster Assignment by Islands of Sites (CASSIS), a method for SM cluster prediction in eukaryotic genomes, and Secondary Metabolites by InterProScan (SMIPS), a tool for genome-wide detection of SM key enzymes (‘anchor’ genes): polyketide synthases, non-ribosomal peptide synthetases and dimethylallyl tryptophan synthases. Unlike other tools based on protein similarity, CASSIS exploits the idea of co-regulation of the cluster genes, which assumes the existence of common regulatory patterns in the cluster promoters. The method searches for ‘islands’ of enriched cluster-specific motifs in the vicinity of anchor genes. It was validated in a series of crossvalidation experiments and showed high sensitivity and specificity. Availability and implementation: CASSIS and SMIPS are freely available at https://sbi.hki-jena.de/ cassis. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes

MOTIVATION Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters' content and lack of other distinguishing sequenc...

متن کامل

The in Silico Characterization of a Salicylic Acid Analogue Coding Gene Clusters in Selected Pseudomonas Fluorescens Strains

Background: The microbial genome sequences provide solid in silico framework for interpretation their drug-like chemical scaffolds biosynthetic potential. The Pseudomonas fluorescens species is metabolically versatile and producing therapeutically important natural products.Objectives: The main objective of the present study was to mine the publically available data of P. fluorescens stra...

متن کامل

antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production of such compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMAS...

متن کامل

Motif-independent de novo detection of secondary metabolite gene clusters—toward identification from filamentous fungi

Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB) gene clusters identified to date contain so-called core genes with distinctive sequence features, s...

متن کامل

Genome mining: Prediction of lipopeptides and polyketides from Bacillus and related Firmicutes

Bacillus and related genera in the Bacillales within the Firmicutes harbor a variety of secondary metabolite gene clusters encoding polyketide synthases and non-ribosomal peptide synthetases responsible for remarkable diverse number of polyketides (PKs) and lipopeptides (LPs). These compounds may be utilized for medical and agricultural applications. Here, we summarize the knowledge on structur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016